{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "359697d5",
   "metadata": {},
   "source": [
    "# LangChain Cookbook 👨‍🍳👩‍🍳"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "11d788b0",
   "metadata": {},
   "source": [
    "*This cookbook is based off the [LangChain Conceptual Documentation](https://docs.langchain.com/docs/)*\n",
    "\n",
    "**Goal:** Provide an introductory understanding of the components and use cases of LangChain via [ELI5](https://www.dictionary.com/e/slang/eli5/#:~:text=ELI5%20is%20short%20for%20%E2%80%9CExplain,a%20complicated%20question%20or%20problem.) examples and code snippets. For use cases check out [part 2](https://github.com/gkamradt/langchain-tutorials/blob/main/LangChain%20Cookbook%20Part%202%20-%20Use%20Cases.ipynb). See [video tutorial](https://www.youtube.com/watch?v=2xxziIWmaSA) of this notebook.\n",
    "\n",
    "\n",
    "**Links:**\n",
    "* [LC Conceptual Documentation](https://docs.langchain.com/docs/)\n",
    "* [LC Python Documentation](https://python.langchain.com/en/latest/)\n",
    "* [LC Javascript/Typescript Documentation](https://js.langchain.com/docs/)\n",
    "* [LC Discord](https://discord.gg/6adMQxSpJS)\n",
    "* [www.langchain.com](https://langchain.com/)\n",
    "* [LC Twitter](https://twitter.com/LangChainAI)\n",
    "\n",
    "\n",
    "### **What is LangChain?**\n",
    "> LangChain is a framework for developing applications powered by language models.\n",
    "\n",
    "**~~TL~~DR**: LangChain makes the complicated parts of working & building with AI models easier. It helps do this in two ways:\n",
    "\n",
    "1. **Integration** - Bring external data, such as your files, other applications, and api data, to your LLMs\n",
    "2. **Agency** - Allow your LLMs to interact with it's environment via decision making. Use LLMs to help decide which action to take next\n",
    "\n",
    "### **Why LangChain?**\n",
    "1. **Components** - LangChain makes it easy to swap out abstractions and components necessary to work with language models.\n",
    "\n",
    "2. **Customized Chains** - LangChain provides out of the box support for using and customizing 'chains' - a series of actions strung together.\n",
    "\n",
    "3. **Speed 🚢** - This team ships insanely fast. You'll be up to date with the latest LLM features.\n",
    "\n",
    "4. **Community 👥** - Wonderful discord and community support, meet ups, hackathons, etc.\n",
    "\n",
    "Though LLMs can be straightforward (text-in, text-out) you'll quickly run into friction points that LangChain helps with once you develop more complicated applications.\n",
    "\n",
    "*Note: This cookbook will not cover all aspects of LangChain. It's contents have been curated to get you to building & impact as quick as possible. For more, please check out [LangChain Conceptual Documentation](https://docs.langchain.com/docs/)*\n",
    "\n",
    "*Update Oct '23: This notebook has been expanded from it's original form*\n",
    "\n",
    "You'll need an OpenAI api key to follow this tutorial. You can have it as an environement variable, in an .env file where this jupyter notebook lives, or insert it below where 'YourAPIKey' is. Have if you have questions on this, put these instructions into [ChatGPT](https://chat.openai.com/)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "e9815081",
   "metadata": {
    "hide_input": false
   },
   "outputs": [],
   "source": [
    "from dotenv import load_dotenv\n",
    "import os\n",
    "\n",
    "load_dotenv()\n",
    "\n",
    "openai_api_key=os.getenv('OPENAI_API_KEY', 'YourAPIKey')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "05bb564d",
   "metadata": {},
   "source": [
    "# LangChain Components\n",
    "\n",
    "## Schema - Nuts and Bolts of working with Large Language Models (LLMs)\n",
    "\n",
    "### **Text**\n",
    "The natural language way to interact with LLMs"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "8e0dc06c",
   "metadata": {
    "hide_input": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'What day comes after Friday?'"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# You'll be working with simple strings (that'll soon grow in complexity!)\n",
    "my_text = \"What day comes after Friday?\"\n",
    "my_text"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2f39eb39",
   "metadata": {},
   "source": [
    "### **Chat Messages**\n",
    "Like text, but specified with a message type (System, Human, AI)\n",
    "\n",
    "* **System** - Helpful background context that tell the AI what to do\n",
    "* **Human** - Messages that are intented to represent the user\n",
    "* **AI** - Messages that show what the AI responded with\n",
    "\n",
    "For more, see OpenAI's [documentation](https://platform.openai.com/docs/guides/chat/introduction)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "99b0935b",
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain.chat_models import ChatOpenAI\n",
    "from langchain.schema import HumanMessage, SystemMessage, AIMessage\n",
    "\n",
    "# This it the language model we'll use. We'll talk about what we're doing below in the next section\n",
    "chat = ChatOpenAI(temperature=.7, openai_api_key=openai_api_key)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a2d2f7af",
   "metadata": {},
   "source": [
    "Now let's create a few messages that simulate a chat experience with a bot"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "878d6a36",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "AIMessage(content='You could try a caprese salad with fresh tomatoes, mozzarella, and basil.')"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "chat(\n",
    "    [\n",
    "        SystemMessage(content=\"You are a nice AI bot that helps a user figure out what to eat in one short sentence\"),\n",
    "        HumanMessage(content=\"I like tomatoes, what should I eat?\")\n",
    "    ]\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0a425aaa",
   "metadata": {},
   "source": [
    "You can also pass more chat history w/ responses from the AI"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "8fd3fe88",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "AIMessage(content='You should also explore the charming streets of the Old Town and indulge in delicious French cuisine.')"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "chat(\n",
    "    [\n",
    "        SystemMessage(content=\"You are a nice AI bot that helps a user figure out where to travel in one short sentence\"),\n",
    "        HumanMessage(content=\"I like the beaches where should I go?\"),\n",
    "        AIMessage(content=\"You should go to Nice, France\"),\n",
    "        HumanMessage(content=\"What else should I do when I'm there?\")\n",
    "    ]\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ff5ee37a",
   "metadata": {},
   "source": [
    "You can also exclude the system message if you want"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "238a49f6",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "AIMessage(content='Friday')"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "chat(\n",
    "    [\n",
    "        HumanMessage(content=\"What day comes after Thursday?\")\n",
    "    ]\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "66bf9634",
   "metadata": {},
   "source": [
    "### **Documents**\n",
    "An object that holds a piece of text and metadata (more information about that text)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "3bbf58b2",
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain.schema import Document"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "6ad9bef6",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Document(page_content=\"This is my document. It is full of text that I've gathered from other places\", metadata={'my_document_id': 234234, 'my_document_source': 'The LangChain Papers', 'my_document_create_time': 1680013019})"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "Document(page_content=\"This is my document. It is full of text that I've gathered from other places\",\n",
    "         metadata={\n",
    "             'my_document_id' : 234234,\n",
    "             'my_document_source' : \"The LangChain Papers\",\n",
    "             'my_document_create_time' : 1680013019\n",
    "         })"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3bd19754",
   "metadata": {},
   "source": [
    "But you don't have to include metadata if you don't want to"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "0798d3ca",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Document(page_content=\"This is my document. It is full of text that I've gathered from other places\")"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "Document(page_content=\"This is my document. It is full of text that I've gathered from other places\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2e462b5d",
   "metadata": {},
   "source": [
    "## Models - The interface to the AI brains"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b27fe982",
   "metadata": {},
   "source": [
    "###  **Language Model**\n",
    "A model that does text in ➡️ text out!\n",
    "\n",
    "*Check out how I changed the model I was using from the default one to ada-001 (a very cheap, low performing model). See more models [here](https://platform.openai.com/docs/models)*"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "74b1a72a",
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain.llms import OpenAI\n",
    "\n",
    "llm = OpenAI(model_name=\"text-ada-001\", openai_api_key=openai_api_key)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "6399c295",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'\\n\\nSaturday'"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "llm(\"What day comes after Friday?\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3ef89bfa",
   "metadata": {},
   "source": [
    "### **Chat Model**\n",
    "A model that takes a series of messages and returns a message output"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "bf091777",
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain.chat_models import ChatOpenAI\n",
    "from langchain.schema import HumanMessage, SystemMessage, AIMessage\n",
    "\n",
    "chat = ChatOpenAI(temperature=1, openai_api_key=openai_api_key)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "f4260711",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "AIMessage(content='Why did the math book go to New York? Because it had too many problems and needed a change of scenery!')"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "chat(\n",
    "    [\n",
    "        SystemMessage(content=\"You are an unhelpful AI bot that makes a joke at whatever the user says\"),\n",
    "        HumanMessage(content=\"I would like to go to New York, how should I do this?\")\n",
    "    ]\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "05c028f9",
   "metadata": {},
   "source": [
    "### Function Calling Models\n",
    "\n",
    "[Function calling models](https://openai.com/blog/function-calling-and-other-api-updates) are similar to Chat Models but with a little extra flavor. They are fine tuned to give structured data outputs.\n",
    "\n",
    "This comes in handy when you're making an API call to an external service or doing extraction."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "1020ff45",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "AIMessage(content='', additional_kwargs={'function_call': {'name': 'get_current_weather', 'arguments': '{\\n  \"location\": \"Boston, MA\"\\n}'}})"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "chat = ChatOpenAI(model='gpt-3.5-turbo-0613', temperature=1, openai_api_key=openai_api_key)\n",
    "\n",
    "output = chat(messages=\n",
    "     [\n",
    "         SystemMessage(content=\"You are an helpful AI bot\"),\n",
    "         HumanMessage(content=\"What’s the weather like in Boston right now?\")\n",
    "     ],\n",
    "     functions=[{\n",
    "         \"name\": \"get_current_weather\",\n",
    "         \"description\": \"Get the current weather in a given location\",\n",
    "         \"parameters\": {\n",
    "             \"type\": \"object\",\n",
    "             \"properties\": {\n",
    "                 \"location\": {\n",
    "                     \"type\": \"string\",\n",
    "                     \"description\": \"The city and state, e.g. San Francisco, CA\"\n",
    "                 },\n",
    "                 \"unit\": {\n",
    "                     \"type\": \"string\",\n",
    "                     \"enum\": [\"celsius\", \"fahrenheit\"]\n",
    "                 }\n",
    "             },\n",
    "             \"required\": [\"location\"]\n",
    "         }\n",
    "     }\n",
    "     ]\n",
    ")\n",
    "output"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0f399a1d",
   "metadata": {},
   "source": [
    "See the extra `additional_kwargs` that is passed back to us? We can take that and pass it to an external API to get data. It saves the hassle of doing output parsing."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c2b70f23",
   "metadata": {},
   "source": [
    "### **Text Embedding Model**\n",
    "Change your text into a vector (a series of numbers that hold the semantic 'meaning' of your text). Mainly used when comparing two pieces of text together.\n",
    "\n",
    "*BTW: Semantic means 'relating to meaning in language or logic.'*"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "1655de82",
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain.embeddings import OpenAIEmbeddings\n",
    "\n",
    "embeddings = OpenAIEmbeddings(openai_api_key=openai_api_key)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "id": "a2c85e7e",
   "metadata": {},
   "outputs": [],
   "source": [
    "text = \"Hi! It's time for the beach\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "id": "ddc5a368",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Here's a sample: [-0.00019600906371495047, -0.0031846734422911363, -0.0007734206914647714, -0.019472001962491232, -0.015092319017854244]...\n",
      "Your embedding is length 1536\n"
     ]
    }
   ],
   "source": [
    "text_embedding = embeddings.embed_query(text)\n",
    "print (f\"Here's a sample: {text_embedding[:5]}...\")\n",
    "print (f\"Your embedding is length {len(text_embedding)}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c38fe99f",
   "metadata": {},
   "source": [
    "## Prompts - Text generally used as instructions to your model"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8b9318ed",
   "metadata": {},
   "source": [
    "### **Prompt**\n",
    "What you'll pass to the underlying model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "id": "2d270239",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "The statement is incorrect. Tomorrow is Tuesday, not Wednesday.\n"
     ]
    }
   ],
   "source": [
    "from langchain.llms import OpenAI\n",
    "\n",
    "llm = OpenAI(model_name=\"text-davinci-003\", openai_api_key=openai_api_key)\n",
    "\n",
    "# I like to use three double quotation marks for my prompts because it's easier to read\n",
    "prompt = \"\"\"\n",
    "Today is Monday, tomorrow is Wednesday.\n",
    "\n",
    "What is wrong with that statement?\n",
    "\"\"\"\n",
    "\n",
    "print(llm(prompt))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "74988254",
   "metadata": {},
   "source": [
    "### **Prompt Template**\n",
    "An object that helps create prompts based on a combination of user input, other non-static information and a fixed template string.\n",
    "\n",
    "Think of it as an [f-string](https://realpython.com/python-f-strings/) in python but for prompts\n",
    "\n",
    "*Advanced: Check out LangSmithHub(https://smith.langchain.com/hub) for many more communit prompt templates*"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "id": "abcc212d",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Final Prompt: \n",
      "I really want to travel to Rome. What should I do there?\n",
      "\n",
      "Respond in one short sentence\n",
      "\n",
      "-----------\n",
      "LLM Output: Visit the Colosseum, the Vatican, and the Trevi Fountain.\n"
     ]
    }
   ],
   "source": [
    "from langchain.llms import OpenAI\n",
    "from langchain import PromptTemplate\n",
    "\n",
    "llm = OpenAI(model_name=\"text-davinci-003\", openai_api_key=openai_api_key)\n",
    "\n",
    "# Notice \"location\" below, that is a placeholder for another value later\n",
    "template = \"\"\"\n",
    "I really want to travel to {location}. What should I do there?\n",
    "\n",
    "Respond in one short sentence\n",
    "\"\"\"\n",
    "\n",
    "prompt = PromptTemplate(\n",
    "    input_variables=[\"location\"],\n",
    "    template=template,\n",
    ")\n",
    "\n",
    "final_prompt = prompt.format(location='Rome')\n",
    "\n",
    "print (f\"Final Prompt: {final_prompt}\")\n",
    "print (\"-----------\")\n",
    "print (f\"LLM Output: {llm(final_prompt)}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ed40bac2",
   "metadata": {},
   "source": [
    "### **Example Selectors**\n",
    "An easy way to select from a series of examples that allow you to dynamic place in-context information into your prompt. Often used when your task is nuanced or you have a large list of examples.\n",
    "\n",
    "Check out different types of example selectors [here](https://python.langchain.com/docs/modules/model_io/prompts/example_selectors/)\n",
    "\n",
    "If you want an overview on why examples are important (prompt engineering), check out [this video](https://www.youtube.com/watch?v=dOxUroR57xs)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "id": "aaf36cd9",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/Users/gregorykamradt/opt/anaconda3/lib/python3.9/site-packages/deeplake/util/check_latest_version.py:32: UserWarning: A newer version of deeplake (3.7.2) is available. It's recommended that you update to the latest version using `pip install -U deeplake`.\n",
      "  warnings.warn(\n"
     ]
    }
   ],
   "source": [
    "from langchain.prompts.example_selector import SemanticSimilarityExampleSelector\n",
    "from langchain.vectorstores import Chroma\n",
    "from langchain.embeddings import OpenAIEmbeddings\n",
    "from langchain.prompts import FewShotPromptTemplate, PromptTemplate\n",
    "from langchain.llms import OpenAI\n",
    "\n",
    "llm = OpenAI(model_name=\"text-davinci-003\", openai_api_key=openai_api_key)\n",
    "\n",
    "example_prompt = PromptTemplate(\n",
    "    input_variables=[\"input\", \"output\"],\n",
    "    template=\"Example Input: {input}\\nExample Output: {output}\",\n",
    ")\n",
    "\n",
    "# Examples of locations that nouns are found\n",
    "examples = [\n",
    "    {\"input\": \"pirate\", \"output\": \"ship\"},\n",
    "    {\"input\": \"pilot\", \"output\": \"plane\"},\n",
    "    {\"input\": \"driver\", \"output\": \"car\"},\n",
    "    {\"input\": \"tree\", \"output\": \"ground\"},\n",
    "    {\"input\": \"bird\", \"output\": \"nest\"},\n",
    "]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "id": "12b4798b",
   "metadata": {},
   "outputs": [],
   "source": [
    "# SemanticSimilarityExampleSelector will select examples that are similar to your input by semantic meaning\n",
    "\n",
    "example_selector = SemanticSimilarityExampleSelector.from_examples(\n",
    "    # This is the list of examples available to select from.\n",
    "    examples, \n",
    "    \n",
    "    # This is the embedding class used to produce embeddings which are used to measure semantic similarity.\n",
    "    OpenAIEmbeddings(openai_api_key=openai_api_key), \n",
    "    \n",
    "    # This is the VectorStore class that is used to store the embeddings and do a similarity search over.\n",
    "    Chroma, \n",
    "    \n",
    "    # This is the number of examples to produce.\n",
    "    k=2\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "id": "2cf30107",
   "metadata": {},
   "outputs": [],
   "source": [
    "similar_prompt = FewShotPromptTemplate(\n",
    "    # The object that will help select examples\n",
    "    example_selector=example_selector,\n",
    "    \n",
    "    # Your prompt\n",
    "    example_prompt=example_prompt,\n",
    "    \n",
    "    # Customizations that will be added to the top and bottom of your prompt\n",
    "    prefix=\"Give the location an item is usually found in\",\n",
    "    suffix=\"Input: {noun}\\nOutput:\",\n",
    "    \n",
    "    # What inputs your prompt will receive\n",
    "    input_variables=[\"noun\"],\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "id": "369442bb",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Give the location an item is usually found in\n",
      "\n",
      "Example Input: tree\n",
      "Example Output: ground\n",
      "\n",
      "Example Input: bird\n",
      "Example Output: nest\n",
      "\n",
      "Input: plant\n",
      "Output:\n"
     ]
    }
   ],
   "source": [
    "# Select a noun!\n",
    "my_noun = \"plant\"\n",
    "# my_noun = \"student\"\n",
    "\n",
    "print(similar_prompt.format(noun=my_noun))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "id": "9bb910f2",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "' pot'"
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "llm(similar_prompt.format(noun=my_noun))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8474c91d",
   "metadata": {},
   "source": [
    "### **Output Parsers Method 1: Prompt Instructions & String Parsing**\n",
    "A helpful way to format the output of a model. Usually used for structured output. LangChain has a bunch more output parsers listed on their [documentation](https://python.langchain.com/docs/modules/model_io/output_parsers).\n",
    "\n",
    "Two big concepts:\n",
    "\n",
    "**1. Format Instructions** - A autogenerated prompt that tells the LLM how to format it's response based off your desired result\n",
    "\n",
    "**2. Parser** - A method which will extract your model's text output into a desired structure (usually json)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "id": "58353756",
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain.output_parsers import StructuredOutputParser, ResponseSchema\n",
    "from langchain.prompts import ChatPromptTemplate, HumanMessagePromptTemplate\n",
    "from langchain.llms import OpenAI"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "id": "ee36f881",
   "metadata": {},
   "outputs": [],
   "source": [
    "llm = OpenAI(model_name=\"text-davinci-003\", openai_api_key=openai_api_key)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "id": "fa59be3f",
   "metadata": {},
   "outputs": [],
   "source": [
    "# How you would like your response structured. This is basically a fancy prompt template\n",
    "response_schemas = [\n",
    "    ResponseSchema(name=\"bad_string\", description=\"This a poorly formatted user input string\"),\n",
    "    ResponseSchema(name=\"good_string\", description=\"This is your response, a reformatted response\")\n",
    "]\n",
    "\n",
    "# How you would like to parse your output\n",
    "output_parser = StructuredOutputParser.from_response_schemas(response_schemas)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "id": "d1079f0a",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The output should be a markdown code snippet formatted in the following schema, including the leading and trailing \"```json\" and \"```\":\n",
      "\n",
      "```json\n",
      "{\n",
      "\t\"bad_string\": string  // This a poorly formatted user input string\n",
      "\t\"good_string\": string  // This is your response, a reformatted response\n",
      "}\n",
      "```\n"
     ]
    }
   ],
   "source": [
    "# See the prompt template you created for formatting\n",
    "format_instructions = output_parser.get_format_instructions()\n",
    "print (format_instructions)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "id": "9aaae5be",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "You will be given a poorly formatted string from a user.\n",
      "Reformat it and make sure all the words are spelled correctly\n",
      "\n",
      "The output should be a markdown code snippet formatted in the following schema, including the leading and trailing \"```json\" and \"```\":\n",
      "\n",
      "```json\n",
      "{\n",
      "\t\"bad_string\": string  // This a poorly formatted user input string\n",
      "\t\"good_string\": string  // This is your response, a reformatted response\n",
      "}\n",
      "```\n",
      "\n",
      "% USER INPUT:\n",
      "welcom to califonya!\n",
      "\n",
      "YOUR RESPONSE:\n",
      "\n"
     ]
    }
   ],
   "source": [
    "template = \"\"\"\n",
    "You will be given a poorly formatted string from a user.\n",
    "Reformat it and make sure all the words are spelled correctly\n",
    "\n",
    "{format_instructions}\n",
    "\n",
    "% USER INPUT:\n",
    "{user_input}\n",
    "\n",
    "YOUR RESPONSE:\n",
    "\"\"\"\n",
    "\n",
    "prompt = PromptTemplate(\n",
    "    input_variables=[\"user_input\"],\n",
    "    partial_variables={\"format_instructions\": format_instructions},\n",
    "    template=template\n",
    ")\n",
    "\n",
    "promptValue = prompt.format(user_input=\"welcom to califonya!\")\n",
    "\n",
    "print(promptValue)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "id": "b116bb23",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'```json\\n{\\n\\t\"bad_string\": \"welcom to califonya!\", \\n\\t\"good_string\": \"Welcome to California!\"\\n}\\n```'"
      ]
     },
     "execution_count": 30,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "llm_output = llm(promptValue)\n",
    "llm_output"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "id": "985aa814",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'bad_string': 'welcom to califonya!', 'good_string': 'Welcome to California!'}"
      ]
     },
     "execution_count": 31,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "output_parser.parse(llm_output)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "07045ae3",
   "metadata": {},
   "source": [
    "### **Output Parsers Method 2: OpenAI Fuctions**\n",
    "When OpenAI released function calling, the game changed. This is recommended method when starting out.\n",
    "\n",
    "They trained models specifically for outputing structured data. It became super easy to specify a Pydantic schema and get a structured output.\n",
    "\n",
    "There are many ways to define your schema, I prefer using Pydantic Models because of how organized they are. Feel free to reference OpenAI's [documention](https://platform.openai.com/docs/guides/gpt/function-calling) for other methods.\n",
    "\n",
    "In order to use this method you'll need to use a model that supports [function calling](https://openai.com/blog/function-calling-and-other-api-updates#:~:text=Developers%20can%20now%20describe%20functions%20to%20gpt%2D4%2D0613%20and%20gpt%2D3.5%2Dturbo%2D0613%2C). I'll use `gpt4-0613`\n",
    "\n",
    "**Example 1: Simple**\n",
    "\n",
    "Let's get started by defining a simple model for us to extract from."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "id": "3593699b",
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain.pydantic_v1 import BaseModel, Field\n",
    "from typing import Optional\n",
    "\n",
    "class Person(BaseModel):\n",
    "    \"\"\"Identifying information about a person.\"\"\"\n",
    "\n",
    "    name: str = Field(..., description=\"The person's name\")\n",
    "    age: int = Field(..., description=\"The person's age\")\n",
    "    fav_food: Optional[str] = Field(None, description=\"The person's favorite food\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "17033d15",
   "metadata": {},
   "source": [
    "Then let's create a chain (more on this later) that will do the extracting for us"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "id": "60b7be09",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Person(name='Sally, Joey, Caroline', age=13, fav_food='spinach')"
      ]
     },
     "execution_count": 33,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from langchain.chains.openai_functions import create_structured_output_chain\n",
    "\n",
    "llm = ChatOpenAI(model='gpt-4-0613', openai_api_key=openai_api_key)\n",
    "\n",
    "chain = create_structured_output_chain(Person, llm, prompt)\n",
    "chain.run(\n",
    "    \"Sally is 13, Joey just turned 12 and loves spinach. Caroline is 10 years older than Sally.\"\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "37370210",
   "metadata": {},
   "source": [
    "Notice how we only have data on one person from that list? That is because we didn't specify we wanted multiple. Let's change our schema to specify that we want a list of people if possible."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "id": "df4ad5e6",
   "metadata": {},
   "outputs": [],
   "source": [
    "from typing import Sequence\n",
    "\n",
    "class People(BaseModel):\n",
    "    \"\"\"Identifying information about all people in a text.\"\"\"\n",
    "\n",
    "    people: Sequence[Person] = Field(..., description=\"The people in the text\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "aa2bc127",
   "metadata": {},
   "source": [
    "Now we'll call for People rather than Person"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "id": "5ba430d5",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "People(people=[Person(name='Sally', age=13, fav_food=None), Person(name='Joey', age=12, fav_food='spinach'), Person(name='Caroline', age=23, fav_food=None)])"
      ]
     },
     "execution_count": 35,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "chain = create_structured_output_chain(People, llm, prompt)\n",
    "chain.run(\n",
    "    \"Sally is 13, Joey just turned 12 and loves spinach. Caroline is 10 years older than Sally.\"\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "12db9b8b",
   "metadata": {},
   "source": [
    "Let's do some more parsing with it\n",
    "\n",
    "**Example 2: Enum**\n",
    "\n",
    "Now let's parse when a product from a list is mentioned"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "id": "6616a735",
   "metadata": {},
   "outputs": [],
   "source": [
    "import enum\n",
    "\n",
    "llm = ChatOpenAI(model='gpt-4-0613', openai_api_key=openai_api_key)\n",
    "\n",
    "class Product(str, enum.Enum):\n",
    "    CRM = \"CRM\"\n",
    "    VIDEO_EDITING = \"VIDEO_EDITING\"\n",
    "    HARDWARE = \"HARDWARE\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "id": "a5250ff5",
   "metadata": {},
   "outputs": [],
   "source": [
    "class Products(BaseModel):\n",
    "    \"\"\"Identifying products that were mentioned in a text\"\"\"\n",
    "\n",
    "    products: Sequence[Product] = Field(..., description=\"The products mentioned in a text\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "id": "dd7e0bbf",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Products(products=[<Product.CRM: 'CRM'>, <Product.HARDWARE: 'HARDWARE'>, <Product.VIDEO_EDITING: 'VIDEO_EDITING'>])"
      ]
     },
     "execution_count": 38,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "chain = create_structured_output_chain(Products, llm, prompt)\n",
    "chain.run(\n",
    "    \"The CRM in this demo is great. Love the hardware. The microphone is also cool. Love the video editing\"\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7b43cec2",
   "metadata": {},
   "source": [
    "## Indexes - Structuring documents to LLMs can work with them"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d3f904e9",
   "metadata": {},
   "source": [
    "### **Document Loaders**\n",
    "Easy ways to import data from other sources. Shared functionality with [OpenAI Plugins](https://openai.com/blog/chatgpt-plugins) [specifically retrieval plugins](https://github.com/openai/chatgpt-retrieval-plugin)\n",
    "\n",
    "See a [big list](https://python.langchain.com/en/latest/modules/indexes/document_loaders.html) of document loaders here. A bunch more on [Llama Index](https://llamahub.ai/) as well."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7d4719d4",
   "metadata": {},
   "source": [
    "**HackerNews**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "id": "ba88e05b",
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain.document_loaders import HNLoader"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "id": "ee693520",
   "metadata": {},
   "outputs": [],
   "source": [
    "loader = HNLoader(\"https://news.ycombinator.com/item?id=34422627\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "id": "88d89ad7",
   "metadata": {},
   "outputs": [],
   "source": [
    "data = loader.load()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "id": "e814f930",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Found 76 comments\n",
      "Here's a sample:\n",
      "\n",
      "Ozzie_osman 8 months ago  \n",
      "             | next [–] \n",
      "\n",
      "LangChain is awesome. For people not sure what it's doing, large language models (LLMs) are very Ozzie_osman 8 months ago  \n",
      "             | parent | next [–] \n",
      "\n",
      "Also, another library to check out is GPT Index (https://github.com/jerryjliu/gpt_index)\n"
     ]
    }
   ],
   "source": [
    "print (f\"Found {len(data)} comments\")\n",
    "print (f\"Here's a sample:\\n\\n{''.join([x.page_content[:150] for x in data[:2]])}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c564583f",
   "metadata": {},
   "source": [
    "**Books from Gutenberg Project**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "id": "72964fb8",
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain.document_loaders import GutenbergLoader\n",
    "\n",
    "loader = GutenbergLoader(\"https://www.gutenberg.org/cache/epub/2148/pg2148.txt\")\n",
    "\n",
    "data = loader.load()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 44,
   "id": "47140a26",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "      At Paris, just after dark one gusty evening in the autumn of 18-,\r\n",
      "\n",
      "\n",
      "      I was enjoying the twofold luxury of meditation \n"
     ]
    }
   ],
   "source": [
    "print(data[0].page_content[1855:1984])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7f1386b0",
   "metadata": {},
   "source": [
    "**URLs and webpages**\n",
    "\n",
    "Let's try it out with [Paul Graham's website](http://www.paulgraham.com/)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 45,
   "id": "46a54e7d",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'New: \\n\\nHow to Do Great Work |\\nRead |\\nWill |\\nTruth\\n\\n\\n\\n\\n\\nWant to start a startup? Get funded by Y Combinator.\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n© mmxxiii pg'"
      ]
     },
     "execution_count": 45,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from langchain.document_loaders import UnstructuredURLLoader\n",
    "\n",
    "urls = [\n",
    "    \"http://www.paulgraham.com/\",\n",
    "]\n",
    "\n",
    "loader = UnstructuredURLLoader(urls=urls)\n",
    "\n",
    "data = loader.load()\n",
    "\n",
    "data[0].page_content"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0e9601db",
   "metadata": {},
   "source": [
    "### **Text Splitters**\n",
    "Often times your document is too long (like a book) for your LLM. You need to split it up into chunks. Text splitters help with this.\n",
    "\n",
    "There are many ways you could split your text into chunks, experiment with [different ones](https://python.langchain.com/en/latest/modules/indexes/text_splitters.html) to see which is best for you."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 46,
   "id": "95713e57",
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain.text_splitter import RecursiveCharacterTextSplitter"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 47,
   "id": "a54455f5",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "You have 1 document\n"
     ]
    }
   ],
   "source": [
    "# This is a long document we can split up.\n",
    "with open('data/PaulGrahamEssays/worked.txt') as f:\n",
    "    pg_work = f.read()\n",
    "    \n",
    "print (f\"You have {len([pg_work])} document\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 48,
   "id": "d19acb18",
   "metadata": {},
   "outputs": [],
   "source": [
    "text_splitter = RecursiveCharacterTextSplitter(\n",
    "    # Set a really small chunk size, just to show.\n",
    "    chunk_size = 150,\n",
    "    chunk_overlap  = 20,\n",
    ")\n",
    "\n",
    "texts = text_splitter.create_documents([pg_work])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 49,
   "id": "e3090f05",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "You have 610 documents\n"
     ]
    }
   ],
   "source": [
    "print (f\"You have {len(texts)} documents\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 50,
   "id": "87a0f45a",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Preview:\n",
      "February 2021Before college the two main things I worked on, outside of school,\n",
      "were writing and programming. I didn't write essays. I wrote what \n",
      "\n",
      "beginning writers were supposed to write then, and probably still\n",
      "are: short stories. My stories were awful. They had hardly any plot,\n"
     ]
    }
   ],
   "source": [
    "print (\"Preview:\")\n",
    "print (texts[0].page_content, \"\\n\")\n",
    "print (texts[1].page_content)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ad9e670d",
   "metadata": {},
   "source": [
    "There are a ton of different ways to do text splitting and it really depends on your retrieval strategy and application design. Check out more splitters [here](https://python.langchain.com/docs/modules/data_connection/document_transformers/)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1f85defb",
   "metadata": {},
   "source": [
    "### **Retrievers**\n",
    "Easy way to combine documents with language models.\n",
    "\n",
    "There are many different types of retrievers, the most widely supported is the VectoreStoreRetriever"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 51,
   "id": "8cccbd82",
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain.document_loaders import TextLoader\n",
    "from langchain.text_splitter import RecursiveCharacterTextSplitter\n",
    "from langchain.vectorstores import FAISS\n",
    "from langchain.embeddings import OpenAIEmbeddings\n",
    "\n",
    "loader = TextLoader('data/PaulGrahamEssays/worked.txt')\n",
    "documents = loader.load()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 52,
   "id": "1dab1c20",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Get your splitter ready\n",
    "text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=50)\n",
    "\n",
    "# Split your docs into texts\n",
    "texts = text_splitter.split_documents(documents)\n",
    "\n",
    "# Get embedding engine ready\n",
    "embeddings = OpenAIEmbeddings(openai_api_key=openai_api_key)\n",
    "\n",
    "# Embedd your texts\n",
    "db = FAISS.from_documents(texts, embeddings)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 53,
   "id": "e62372be",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Init your retriever. Asking for just 1 document back\n",
    "retriever = db.as_retriever()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 54,
   "id": "e0534bbd",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "VectorStoreRetriever(tags=['FAISS'], vectorstore=<langchain.vectorstores.faiss.FAISS object at 0x7f8389169070>)"
      ]
     },
     "execution_count": 54,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "retriever"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 55,
   "id": "3846a3b5",
   "metadata": {},
   "outputs": [],
   "source": [
    "docs = retriever.get_relevant_documents(\"what types of things did the author want to build?\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 56,
   "id": "db383cc8",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "standards; what was the point? No one else wanted one either, so\n",
      "off they went. That was what happened to systems work.I wanted not just to build things, but to build things that would\n",
      "last.In this di\n",
      "\n",
      "much of it in grad school.Computer Science is an uneasy alliance between two halves, theory\n",
      "and systems. The theory people prove things, and the systems people\n",
      "build things. I wanted to build things. \n"
     ]
    }
   ],
   "source": [
    "print(\"\\n\\n\".join([x.page_content[:200] for x in docs[:2]]))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "24193139",
   "metadata": {},
   "source": [
    "### **VectorStores**\n",
    "Databases to store vectors. Most popular ones are [Pinecone](https://www.pinecone.io/) & [Weaviate](https://weaviate.io/). More examples on OpenAIs [retriever documentation](https://github.com/openai/chatgpt-retrieval-plugin#choosing-a-vector-database). [Chroma](https://www.trychroma.com/) & [FAISS](https://engineering.fb.com/2017/03/29/data-infrastructure/faiss-a-library-for-efficient-similarity-search/) are easy to work with locally.\n",
    "\n",
    "Conceptually, think of them as tables w/ a column for embeddings (vectors) and a column for metadata.\n",
    "\n",
    "Example\n",
    "\n",
    "| Embedding      | Metadata |\n",
    "| ----------- | ----------- |\n",
    "| [-0.00015641732898075134, -0.003165106289088726, ...]      | {'date' : '1/2/23}       |\n",
    "| [-0.00035465431654651654, 1.4654131651654516546, ...]   | {'date' : '1/3/23}        |"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 57,
   "id": "3c5533ad",
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain.document_loaders import TextLoader\n",
    "from langchain.text_splitter import RecursiveCharacterTextSplitter\n",
    "from langchain.vectorstores import FAISS\n",
    "from langchain.embeddings import OpenAIEmbeddings\n",
    "\n",
    "loader = TextLoader('data/PaulGrahamEssays/worked.txt')\n",
    "documents = loader.load()\n",
    "\n",
    "# Get your splitter ready\n",
    "text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=50)\n",
    "\n",
    "# Split your docs into texts\n",
    "texts = text_splitter.split_documents(documents)\n",
    "\n",
    "# Get embedding engine ready\n",
    "embeddings = OpenAIEmbeddings(openai_api_key=openai_api_key)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 58,
   "id": "661fdf19",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "You have 78 documents\n"
     ]
    }
   ],
   "source": [
    "print (f\"You have {len(texts)} documents\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 59,
   "id": "e99ac0ea",
   "metadata": {},
   "outputs": [],
   "source": [
    "embedding_list = embeddings.embed_documents([text.page_content for text in texts])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 60,
   "id": "89e7758c",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "You have 78 embeddings\n",
      "Here's a sample of one: [-0.001058628615053026, -0.01118234211553424, -0.012874804746266883]...\n"
     ]
    }
   ],
   "source": [
    "print (f\"You have {len(embedding_list)} embeddings\")\n",
    "print (f\"Here's a sample of one: {embedding_list[0][:3]}...\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8ac358c5",
   "metadata": {},
   "source": [
    "Your vectorstore store your embeddings (☝️) and make them easily searchable"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f9b9b79b",
   "metadata": {},
   "source": [
    "## Memory\n",
    "Helping LLMs remember information.\n",
    "\n",
    "Memory is a bit of a loose term. It could be as simple as remembering information you've chatted about in the past or more complicated information retrieval.\n",
    "\n",
    "We'll keep it towards the Chat Message use case. This would be used for chat bots.\n",
    "\n",
    "There are many types of memory, explore [the documentation](https://python.langchain.com/en/latest/modules/memory/how_to_guides.html) to see which one fits your use case."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f43b49da",
   "metadata": {},
   "source": [
    "### Chat Message History"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 61,
   "id": "893a18c1",
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain.memory import ChatMessageHistory\n",
    "from langchain.chat_models import ChatOpenAI\n",
    "\n",
    "chat = ChatOpenAI(temperature=0, openai_api_key=openai_api_key)\n",
    "\n",
    "history = ChatMessageHistory()\n",
    "\n",
    "history.add_ai_message(\"hi!\")\n",
    "\n",
    "history.add_user_message(\"what is the capital of france?\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 62,
   "id": "a2949fda",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[AIMessage(content='hi!'),\n",
       " HumanMessage(content='what is the capital of france?')]"
      ]
     },
     "execution_count": 62,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "history.messages"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 63,
   "id": "9b74d5cf",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "AIMessage(content='The capital of France is Paris.')"
      ]
     },
     "execution_count": 63,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "ai_response = chat(history.messages)\n",
    "ai_response"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 64,
   "id": "529e168f",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[AIMessage(content='hi!'),\n",
       " HumanMessage(content='what is the capital of france?'),\n",
       " AIMessage(content='The capital of France is Paris.')]"
      ]
     },
     "execution_count": 64,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "history.add_ai_message(ai_response.content)\n",
    "history.messages"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f29fc79c",
   "metadata": {},
   "source": [
    "## Chains ⛓️⛓️⛓️\n",
    "Combining different LLM calls and action automatically\n",
    "\n",
    "Ex: Summary #1, Summary #2, Summary #3 > Final Summary\n",
    "\n",
    "Check out [this video](https://www.youtube.com/watch?v=f9_BWhCI4Zo&t=2s) explaining different summarization chain types\n",
    "\n",
    "There are [many applications of chains](https://python.langchain.com/en/latest/modules/chains/how_to_guides.html) search to see which are best for your use case.\n",
    "\n",
    "We'll cover two of them:"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c34ba415",
   "metadata": {},
   "source": [
    "### 1. Simple Sequential Chains\n",
    "\n",
    "Easy chains where you can use the output of an LLM as an input into another. Good for breaking up tasks (and keeping your LLM focused)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 65,
   "id": "79fc0950",
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain.llms import OpenAI\n",
    "from langchain.chains import LLMChain\n",
    "from langchain.prompts import PromptTemplate\n",
    "from langchain.chains import SimpleSequentialChain\n",
    "\n",
    "llm = OpenAI(temperature=1, openai_api_key=openai_api_key)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 66,
   "id": "43d4494a",
   "metadata": {},
   "outputs": [],
   "source": [
    "template = \"\"\"Your job is to come up with a classic dish from the area that the users suggests.\n",
    "% USER LOCATION\n",
    "{user_location}\n",
    "\n",
    "YOUR RESPONSE:\n",
    "\"\"\"\n",
    "prompt_template = PromptTemplate(input_variables=[\"user_location\"], template=template)\n",
    "\n",
    "# Holds my 'location' chain\n",
    "location_chain = LLMChain(llm=llm, prompt=prompt_template)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 67,
   "id": "b6c8e00f",
   "metadata": {},
   "outputs": [],
   "source": [
    "template = \"\"\"Given a meal, give a short and simple recipe on how to make that dish at home.\n",
    "% MEAL\n",
    "{user_meal}\n",
    "\n",
    "YOUR RESPONSE:\n",
    "\"\"\"\n",
    "prompt_template = PromptTemplate(input_variables=[\"user_meal\"], template=template)\n",
    "\n",
    "# Holds my 'meal' chain\n",
    "meal_chain = LLMChain(llm=llm, prompt=prompt_template)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 68,
   "id": "7e0b83f2",
   "metadata": {},
   "outputs": [],
   "source": [
    "overall_chain = SimpleSequentialChain(chains=[location_chain, meal_chain], verbose=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 69,
   "id": "7d19c64d",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "\n",
      "\u001b[1m> Entering new SimpleSequentialChain chain...\u001b[0m\n",
      "\u001b[36;1m\u001b[1;3m\n",
      "A classic dish from Rome is Spaghetti alla Carbonara, featuring egg, Parmesan cheese, black pepper, and pancetta or guanciale.\u001b[0m\n",
      "\u001b[33;1m\u001b[1;3m\n",
      "Ingredients:\n",
      "- 8oz spaghetti \n",
      "- 4 tablespoons olive oil\n",
      "- 4oz diced pancetta or guanciale\n",
      "- 2 cloves garlic, minced\n",
      "- 2 eggs, lightly beaten\n",
      "- 2 tablespoons parsley, chopped \n",
      "- ½ cup grated Parmesan \n",
      "- Salt and black pepper to taste\n",
      "\n",
      "Instructions:\n",
      "1. Bring a pot of salted water to a boil and add the spaghetti. Cook according to package directions. \n",
      "2. Meanwhile, add the olive oil to a large skillet over medium-high heat. Add the diced pancetta and garlic, and cook until pancetta is browned and garlic is fragrant.\n",
      "3. In a medium bowl, whisk together the eggs, parsley, Parmesan, and salt and pepper.\n",
      "4. Drain the cooked spaghetti and add it to the skillet with the pancetta and garlic. Remove from heat and pour the egg mixture over the spaghetti, stirring to combine. \n",
      "5. Serve the spaghetti alla carbonara with additional Parmesan cheese and black pepper.\u001b[0m\n",
      "\n",
      "\u001b[1m> Finished chain.\u001b[0m\n"
     ]
    }
   ],
   "source": [
    "review = overall_chain.run(\"Rome\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f6191bf5",
   "metadata": {},
   "source": [
    "### 2. Summarization Chain\n",
    "\n",
    "Easily run through long numerous documents and get a summary. Check out [this video](https://www.youtube.com/watch?v=f9_BWhCI4Zo) for other chain types besides map-reduce"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 70,
   "id": "6f218c3e",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "\n",
      "\u001b[1m> Entering new MapReduceDocumentsChain chain...\u001b[0m\n",
      "\n",
      "\n",
      "\u001b[1m> Entering new LLMChain chain...\u001b[0m\n",
      "Prompt after formatting:\n",
      "\u001b[32;1m\u001b[1;3mWrite a concise summary of the following:\n",
      "\n",
      "\n",
      "\"January 2017Because biographies of famous scientists tend to \n",
      "edit out their mistakes, we underestimate the \n",
      "degree of risk they were willing to take.\n",
      "And because anything a famous scientist did that\n",
      "wasn't a mistake has probably now become the\n",
      "conventional wisdom, those choices don't\n",
      "seem risky either.Biographies of Newton, for example, understandably focus\n",
      "more on physics than alchemy or theology.\n",
      "The impression we get is that his unerring judgment\n",
      "led him straight to truths no one else had noticed.\n",
      "How to explain all the time he spent on alchemy\n",
      "and theology?  Well, smart people are often kind of\n",
      "crazy.But maybe there is a simpler explanation. Maybe\"\n",
      "\n",
      "\n",
      "CONCISE SUMMARY:\u001b[0m\n",
      "Prompt after formatting:\n",
      "\u001b[32;1m\u001b[1;3mWrite a concise summary of the following:\n",
      "\n",
      "\n",
      "\"the smartness and the craziness were not as separate\n",
      "as we think. Physics seems to us a promising thing\n",
      "to work on, and alchemy and theology obvious wastes\n",
      "of time. But that's because we know how things\n",
      "turned out. In Newton's day the three problems \n",
      "seemed roughly equally promising. No one knew yet\n",
      "what the payoff would be for inventing what we\n",
      "now call physics; if they had, more people would \n",
      "have been working on it. And alchemy and theology\n",
      "were still then in the category Marc Andreessen would \n",
      "describe as \"huge, if true.\"Newton made three bets. One of them worked. But \n",
      "they were all risky.\"\n",
      "\n",
      "\n",
      "CONCISE SUMMARY:\u001b[0m\n",
      "\n",
      "\u001b[1m> Finished chain.\u001b[0m\n",
      "\n",
      "\n",
      "\u001b[1m> Entering new LLMChain chain...\u001b[0m\n",
      "Prompt after formatting:\n",
      "\u001b[32;1m\u001b[1;3mWrite a concise summary of the following:\n",
      "\n",
      "\n",
      "\" Biographies of famous scientists often edit out their mistakes, giving readers the wrong impression that they never faced any risks to achieve successful results. An example of this is Newton, whose smartness is assumed to have led straight him to truths without any detours into alchemy or theology - despite the fact that he spent a lot of time on both fields. Maybe the simpler explanation is that he was willing to take risks, even if it means potentially making mistakes.\n",
      "\n",
      " In the 17th century, Newton took a risk and made three bets, one of which turned out to be a successful invention of what we now call physics. The other two bets were on less popular subjects of the time such as alchemy and theology. People did not know then what the payoff would be, but the bets still seemed relatively promising.\"\n",
      "\n",
      "\n",
      "CONCISE SUMMARY:\u001b[0m\n",
      "\n",
      "\u001b[1m> Finished chain.\u001b[0m\n",
      "\n",
      "\u001b[1m> Finished chain.\u001b[0m\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "\" Biographies tend to omit famous scientists' mistakes from their stories, but Newton was willing to take risks and explore multiple fields to make his discoveries. He placed three risky bets, one of which resulted in the creation of physics as we know it today.\""
      ]
     },
     "execution_count": 70,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from langchain.chains.summarize import load_summarize_chain\n",
    "from langchain.document_loaders import TextLoader\n",
    "from langchain.text_splitter import RecursiveCharacterTextSplitter\n",
    "\n",
    "loader = TextLoader('data/PaulGrahamEssays/disc.txt')\n",
    "documents = loader.load()\n",
    "\n",
    "# Get your splitter ready\n",
    "text_splitter = RecursiveCharacterTextSplitter(chunk_size=700, chunk_overlap=50)\n",
    "\n",
    "# Split your docs into texts\n",
    "texts = text_splitter.split_documents(documents)\n",
    "\n",
    "# There is a lot of complexity hidden in this one line. I encourage you to check out the video above for more detail\n",
    "chain = load_summarize_chain(llm, chain_type=\"map_reduce\", verbose=True)\n",
    "chain.run(texts)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "84f6193c",
   "metadata": {},
   "source": [
    "## Agents 🤖🤖\n",
    "\n",
    "Official LangChain Documentation describes agents perfectly (emphasis mine):\n",
    "> Some applications will require not just a predetermined chain of calls to LLMs/other tools, but potentially an **unknown chain** that depends on the user's input. In these types of chains, there is a “agent” which has access to a suite of tools. Depending on the user input, the agent can then **decide which, if any, of these tools to call**.\n",
    "\n",
    "\n",
    "Basically you use the LLM not just for text output, but also for decision making. The coolness and power of this functionality can't be overstated enough.\n",
    "\n",
    "Sam Altman emphasizes that the LLMs are good '[reasoning engine](https://www.youtube.com/watch?v=L_Guz73e6fw&t=867s)'. Agent take advantage of this."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3ce05d51",
   "metadata": {},
   "source": [
    "### Agents\n",
    "\n",
    "The language model that drives decision making.\n",
    "\n",
    "More specifically, an agent takes in an input and returns a response corresponding to an action to take along with an action input. You can see different types of agents (which are better for different use cases) [here](https://python.langchain.com/en/latest/modules/agents/agents/agent_types.html)."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f696b65c",
   "metadata": {},
   "source": [
    "### Tools\n",
    "\n",
    "A 'capability' of an agent. This is an abstraction on top of a function that makes it easy for LLMs (and agents) to interact with it. Ex: Google search.\n",
    "\n",
    "This area shares commonalities with [OpenAI plugins](https://platform.openai.com/docs/plugins/introduction)."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a11f8231",
   "metadata": {},
   "source": [
    "### Toolkit\n",
    "\n",
    "Groups of tools that your agent can select from\n",
    "\n",
    "Let's bring them all together:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 71,
   "id": "67d5d82d",
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain.agents import load_tools\n",
    "from langchain.agents import initialize_agent\n",
    "from langchain.llms import OpenAI\n",
    "import json\n",
    "\n",
    "llm = OpenAI(temperature=0, openai_api_key=openai_api_key)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 72,
   "id": "0ddcdbb9",
   "metadata": {},
   "outputs": [],
   "source": [
    "serpapi_api_key=os.getenv(\"SERP_API_KEY\", \"YourAPIKey\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 73,
   "id": "44fad67f",
   "metadata": {},
   "outputs": [],
   "source": [
    "toolkit = load_tools([\"serpapi\"], llm=llm, serpapi_api_key=serpapi_api_key)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 74,
   "id": "f544a74b",
   "metadata": {},
   "outputs": [],
   "source": [
    "agent = initialize_agent(toolkit, llm, agent=\"zero-shot-react-description\", verbose=True, return_intermediate_steps=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 75,
   "id": "c4882754",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "\n",
      "\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
      "\u001b[32;1m\u001b[1;3m I should try to find out what band Natalie Bergman is a part of.\n",
      "Action: Search\n",
      "Action Input: \"Natalie Bergman band\"\u001b[0m\n",
      "Observation: \u001b[36;1m\u001b[1;3m['Natalie Bergman is an American singer-songwriter. She is one half of the duo Wild Belle, along with her brother Elliot Bergman. Her debut solo album, Mercy, was released on Third Man Records on May 7, 2021. She is based in Los Angeles.', 'Natalie Bergman type: American singer-songwriter.', 'Natalie Bergman main_tab_text: Overview.', 'Natalie Bergman kgmid: /m/0qgx4kh.', 'Natalie Bergman genre: Folk.', 'Natalie Bergman parents: Susan Bergman, Judson Bergman.', 'Natalie Bergman born: 1988 or 1989 (age 34–35).', 'Natalie Bergman is an American singer-songwriter. She is one half of the duo Wild Belle, along with her brother Elliot Bergman. Her debut solo album, Mercy, ...']\u001b[0m\n",
      "Thought:\u001b[32;1m\u001b[1;3m I should search for the first album of Wild Belle\n",
      "Action: Search\n",
      "Action Input: \"Wild Belle first album\"\u001b[0m\n",
      "Observation: \u001b[36;1m\u001b[1;3mIsles\u001b[0m\n",
      "Thought:\u001b[32;1m\u001b[1;3m I now know the final answer\n",
      "Final Answer: Isles is the first album of the band that Natalie Bergman is a part of.\u001b[0m\n",
      "\n",
      "\u001b[1m> Finished chain.\u001b[0m\n"
     ]
    }
   ],
   "source": [
    "response = agent({\"input\":\"what was the first album of the\" \n",
    "                    \"band that Natalie Bergman is a part of?\"})"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3f9c30d2",
   "metadata": {},
   "source": [
    "![Wild Belle](data/WildBelle1.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "14f4b368",
   "metadata": {},
   "source": [
    "🎵Enjoy🎵\n",
    "https://open.spotify.com/track/1eREJIBdqeCcqNCB1pbz7w?si=c014293b63c7478c"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.13"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
